A comprehensive DNA barcode inventory of Austria’s fish species

Austria is inhabited by more than 80 species of native and non-native freshwater fishes. Despite considerable knowledge about Austrian fish species, the latest Red List of threatened species dates back 15 years and a systematic genetic inventory of Austria’s fish species does not exist. To fulfill this deficit, we employed DNA barcoding to generate an up-to-date and comprehensive genetic reference database for Austrian fish species. In total, 639 newly generated cytochrome c oxidase subunit 1 (COI) sequences were added to the 377 existing records from the BOLD data base, to compile a near complete reference dataset. Standard sequence similarity analyses resulted in 83 distinct clusters almost perfectly reflecting the expected number of species in Austria. Mean intraspecific distances of 0.22% were significantly lower than distances to closest relatives, resulting in a pronounced barcoding gap and unique Barcode Index Numbers (BINs) for most of the species. Four cases of BIN sharing were detected, pointing to hybridization and/or recent divergence, whereas in Phoxinus spp., Gobio spp. and Barbatula barbatula intraspecific splits, multiple BINs and consequently cryptic diversity were observed. The overall high identification success and clear genetic separation of most of the species confirms the applicability and accuracy of genetic methods for bio-surveillance. Furthermore, the new DNA barcoding data pinpoints cases of taxonomic uncertainty, which need to be addressed in further detail, to more precisely assort genetic lineages and their local distribution ranges in a new National Red-List.


Introduction
DNA barcoding was introduced as a suitable method for biological species discrimination in animals in 2003 [1], and since then the method has continued to receive unprecedented attention. For most animal groups, the region near the 5'-end of the cytochrome C oxidase subunit 1 (COI) is established as the standard barcoding marker. Despite certain valid reservations [e.g. [2][3][4], an enormous number of studies on various taxonomic groups (e.g., see [5] for plants a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [6], for insects [7,8], for amphibians and reptiles [9], for fungi, and [10] for fish) have accumulated over the last two decades. One particular upside of DNA barcoding is the breadth of useful applications. When applied to fishes, it can be used to investigate freshwater [10] or marine species [11][12][13], to determine species regardless of their ontogenetic stage [14][15][16][17] or to identify only residual parts of animals [18]. Furthermore, DNA barcoding data is increasingly used as a means for tracking catch records, food authenticity, mislabeling or fraud [19][20][21][22]. Moreover, freshwater ecosystems are among the most threatened throughout the world and freshwater species in Europe have experienced an 83% decline in populations over the last 50 years [23,24]. Habitat degradation, water pollution, river channel regulation, hydropower exploitation, invasive species and ultimately climate change entail a range of pressures that threaten freshwater biodiversity worldwide [24][25][26]. Furthermore, the high level of endemism within freshwater ecosystems, coupled with challenges in direct observation, requires tools for sound identification of species and evolutionary significant units to implement conservation efforts [27,28]. Species discrimination is also critical for biological monitoring and conservation purposes, hence DNA barcoding has gained additional importance in the light of recent alerts of biodiversity loss across all terrestrial and aquatic habitats [29,30]. Furthermore, biological surveillance increasingly encourages non-invasive sampling techniques like environmental DNA (eDNA) approaches [31,32], which heavily rely on high-quality genetic reference databases in order to facilitate reliable read identification and species assignment. Tracking biodiversity, however, requires precise species determination and while the identification of most adult (European) fishes can usually be achieved quite easily by experts, some morphologically challenging cases like the whitefishes (Coregonus spp.), minnows (Phoxinus spp.) or alien species like weatherfishes (Misgurnus spp.) [32][33][34][35][36] as well as the identification of juvenile fish remain difficult tasks [14][15][16][17]. In such cases, DNA barcoding might not necessarily replace classical morphology-based approaches as a stand-alone technique, but can aid as a complementary method to increase resolution [16,37,38]. However, in order to yield optimal identification results, DNA barcoding is heavily dependent on high quality, deep coverage reference libraries (e.g. the BOLD database [39]), which profit from the steady augmentation with unambiguously determined reference specimens [10]. Several national barcoding initiatives (such as GBOL, www.bolgermany.de; Barcoding Fauna Bavarica, barcoding-zsm.de/bfb; SWISSBOL, www.swissbol.ch; FINBOL, www.finbol.org; NORBOL, www.norbol.org) contribute their share and ensure continuity and the steady increase in reference data quality [40,41]. The Austrian Barcode of Life initiative (ABOL, www.abol.ac.at) is part of this international network aiming to contribute to this global database and, concomitantly investigate native biodiversity.
Based on the latest Austrian Red List of endangered teleost fish and lamprey species from 2007 [42] as well as other literature on the Austrian fish fauna [44], approximately 85 fish species are present in Austria, 70 of which are considered native. However, these literature sources differ widely concerning some taxa. For example, the genus Coregonus accounts for 12 out of 85 species in [42], but only a single entity in [43], where it was considered to be a "species complex" due to taxonomic uncertainties. As the current Red List was compiled almost 15 years ago (last version from 2007) and new/alien invaders/species/lineages [34,35,[45][46][47][48] have been recently recorded, the current ABOL-project also provides a valuable source of data for an update of the current Red List of Austrian teleost fish and lampreys, and a timely overview of the current freshwater fish diversity of Austria. Comprehensive knowledge on fish diversity is key for designing appropriate conservation action plans and may also support initial assessment of the need for management actions to be taken against invasive species.
Taken together, this study aims to i) add unambiguously determined reference specimens of Austrian fish to the international barcode of life database (BOLD), ii) contribute to the current understanding of the Austrian fish fauna and investigate the extant diversity (loss of species in the wild, new invaders/introductions) and iii) test the discriminating power of DNA barcoding for Austrian fishes.

Material and methods
The cumulative combination of all teleost fish and lamprey species listed in [43,44] as well as the current Red List for Austrian freshwater fishes [42] was used to define the extant freshwater fish diversity in Austria. According to the literature, 70 out of 85 species are listed as native. Additionally, a newly described species of gudgeon [45] and an alien species of weatherfish [48] have been added to the known fish diversity. In order to comprehensively cover the Austrian species assemblage, the present dataset consists of two sources of barcode sequences: i) COI sequences of Austrian fish species already available from BOLD ([32,34,35,45-48] including unpublished records (iBOL data release)) and ii) new COI barcode sequences generated in the course of this study. At the time this dataset was compiled, 1,048 COI sequences of Austrian fishes were available on BOLD (22.03.2021). Of those, samples not identified to the species level as well as all samples with sequences less than 500 bp in length were excluded, leaving 377 BOLD sequences. For more in-depth analyses of potentially ambiguous taxa pinpointed by the initial investigation (see below), sequences from other regions of Europe, outside of Austria, were downloaded from BOLD and compiled into separate datasets for Phoxinus spp.
All sequences were edited manually using MEGA 6.06 [66] and uploaded to the BOLD database, and are accessible under the project 'ABOL-Barcoding of the Austrian fish and lampreys (BCAFL)'. The final dataset of both downloaded and newly generated sequences consisted of 1,016 sequences (DS-AFISH dx.doi.org/10.5883/DS-AFISH) for subsequent analyses (see Table 1 for number of sequences per species). Visualization of sequence similarity clustering was conducted using the 'Taxon ID Tree' tool implemented on BOLD with the BOLD aligner algorithm. Intra (I max )-and interspecific genetic distances (distance to nearest neighbor-DNN) were calculated under the K2P model with the 'Barcode Gap Analysis' tool also implemented on BOLD (K2P distance model, BOLD aligner, complete deletion for ambiguous base/gap handling). Furthermore, both distance-based, Automatic Barcode Gap Discovery' (ABGD, [67]) or 'Assemble Species by Automatic Partitioning' (ASAP, [68]), and tree-based, the 'Bayesian Poisson Tree Processes' model (bPTP, [69]), species delimitation methods were conducted. For ABGD, the alignment containing all sequences was downloaded from BOLD and uploaded to the ABGD webserver (https://bioinfo.mnhn.fr/abi/public/abgd/abgdweb. html). Analyses were run with the Kimura (K2P) TS/TV model with the preset parameters (Pmin: 0.001, Pmax: 0.1, Steps: 10, X (relative gap width): 1.5). The same procedure was conducted for ASAP, also run from a webserver (https://bioinfo.mnhn.fr/abi/public/asap/ asapweb.html) with the default parameters. For the bPTP analysis, the phylogenetic input tree was inferred using the IQ-TREE webserver (http://iqtree.cibiv.univie.ac.at/) with the automatic substitution model and 1000 ultrafast bootstrap replicates [70]. The resulting tree was converted to Newick format in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) and uploaded to the bPTP webserver (https://species.h-its.org/ptp/) where the analysis was run with 100,000 MCMC generations, the thinning set to 100, a burn-in fraction of 0.1 and a random seed [69].

Results
From the 689 samples covering all but one of the extant families (only Anguillidae is missing), 96% of the genera and 95% of all fish species present in Austria (based on [42][43][44]), 639 COI barcodes ranging from 512 to 700 bp in length were generated, representing an overall sequencing success rate of 93%. All sequences are accessible on BOLD (project code 'BCAFL') and GenBank (ON097269-ON097906). The overall dataset (1,016 sequences), including downloaded records from Austrian fish samples, covers a total of 94% of all families, 98% of all genera and 96% of all species present in Austria. The sequence similarity clustering resulted in 84 distinct clades largely mirroring morphological species identification and 83 Barcode Index Numbers (BINs, Fig 2).
One specimen originally identified as Prussian carp (Carassius gibelio) was quite divergent from other alleged C. gibelio samples. A BLAST search in BOLD/GenBank indicated, with 100% sequence similarity, that this divergent haplotype sampled in Schwarzaubach in Styria most likely represents the Ginbuna, Carassius langsdorfii, a species hitherto unknown for Austria. In addition to this new record, discordances between currently accepted species, DNA  . These results were also largely reflected by the analysis of genetic distances (Table 1). With mean intra-and interspecific distances of 0.22 and 6.49% respectively, the barcode gap (i.e., interspecific distances exceeding intraspecific distances) was well reflected for most of the species (Fig 3). Only Blicca bjoerkna (maximum intraspecific distance (I max ): 4.68 (due to a single morphologically clear B. bjoerkna specimen with introgressed Abramis brama mtDNA), the species/lineages of Phoxinus spp. (I max : 6.28) and Eudontomyzon mariae (I max : 5.25) showed higher intraspecific than interspecific distances. Additionally, distances to conspecifics exceeding 1.0% were also detected within Alburnoides bipunctatus, Barbatula barbatula, Chondrostoma nasus, Cottus gobio, Gobio spp., Perca fluviatilis, Romanogobio carpathorossicus,  (Table 1). However, except for Gobio spp. and Barbatula barbatula these cases did not result in additional BINs. Similar results were also obtained from the other species delimitation analyses (see S1 Table). ABGD resulted in 88 species in the initial and 90 species in the recursive partition using a prior maximal distance of P = 0.0129. ASAP on the other hand reported 65-91 partitions/species based on the ten best partitioning schemes regarding the ASAP score. Even though the exact grouping of samples/species varies slightly between the individual priors and partitions, the overall patterns are the same, e.g., Gobio gudgeons are lumped into two groups despite the three lineages found by [47], Phoxinus minnows result in at least three distinct groups and that Ameiurus nebulosus and A. melas result in different groups despite their shared BIN. Finally, the maximum likelihood partitioning of the tree-based bPTP resulted in 88 species. Analysis of available pan-European stone loach data revealed at least five distinct lineages (and BINs) of Barbatula barbatula in Europe (Fig 4). Two of those lineages are solely comprised by samples from Germany or Russia, while the other three lineages contain samples from several countries reflecting a geographical pattern with an eastern (Germany and Austria), Danubian, and Northeastern European clade. Interestingly, the Danubian clade branches off from B. vardarensis native to Greece and North Macedonia, with which it shares a common BIN (BOLD:AAA1243).

Rutilus virgo, Scardinius erythrophthalmus, Squalius cephalus, Tinca tinca and Thymallus thymallus
Minnows of the genus Phoxinus, however, revealed a far more complex pattern based on their COI sequences. Besides country-and Balkan-specific MOTUs, four Central European lineages containing samples from several countries including Austria were found (Fig 5, S1  Table). Assignment of species names to these molecular taxonomic units (MOTUs) proved difficult, as each cluster contained specimens of various determinations (e.g. BOLD:ADL2661 contained Phoxinus sp., P. phoxinus and P. marsilii). Nonetheless, our results are wholly congruent with the presence of more than one species of Phoxinus in Austria and consequently also in Europe [35].

Discussion
In this study, we present an almost complete DNA barcode reference inventory for Austrian fishes. From the 639 newly generated COI barcode sequences, only the European eel (Anguilla anguilla) as well as two sturgeon species, namely the Russian sturgeon (Acipenser gueldenstaedtii) and the ship sturgeon (Acipenser nudiventris), which have also been listed for Austria [43], are missing. For the two former species, PCRs (of old museum tissue) were unsuccessful, for the latter species no samples could be obtained. For all species, two or more samples were obtained, except for the racer goby (Babka gymnotrachelus), Balon's ruffe (Gymnocephalus baloni), the stellate sturgeon (Acipenser stellatus), the blue bream (Ballerus ballerus), the sunbleak (Leucaspius delineatus) and the bighead carp (Hypophthalmichthys nobilis), for which only a single sample was available. Whitefish (Coregonus spp.) were not treated as distinct species in our study as there is no consensus yet on whether the different forms found in the different lakes represent different species or ecotypes and because previous studies have shown that divergence of these species/ecotypes is too recent to be fully resolved by mtDNA data [73,74]. These issues are further complicated by hybridization with closely related introduced species throughout their ranges [73,74]. Similar to previous studies [10], analysis of the DNA barcoding data largely mirrors the known national species inventory. However, we found a few cases of BIN sharing and deep intraspecific divergence, potentially indicating cryptic diversity and/or new species records for Austria, in our new dataset.

Taxa sharing BINs
BIN sharing was detected with two species pairs and one trio of species: i) Leuciscus leuciscus and Leuciscus idus, ii) Ameiurus nebulosus and Ameiurus melas and iii) Carassius auratus, Carassius gibelio and Carassius langsdorfii. For L. leuciscus and L. idus, hybridization and mitochondrial replacement has been reported [75], resulting in a shared common haplotype and consequently the same BIN (BOLD:AAD5733). The black bullhead (Ameiurus melas) and the brown bullhead (A. nebulosus) shared the same BIN (BOLD:AAA7255), even though they are clearly separated in the NJ tree (see Fig 2) and other species delimitation analyses. However, this pattern is not an artefact of the Austrian samples alone, but a general pattern evident on BOLD, as this particular BIN is comprised nearly equally by A. melas and A. nebulosus samples (https://www.boldsystems.org/index.php/Public_BarcodeCluster?clusteruri=BOLD: AAA7255), underscoring the shallow divergence between the two species. The two species can be clearly distinguished by morphological characters [76], but introgressive hybridization has been reported repeatedly [77 and references therein] and could be an additional problem for molecular delimitation. Furthermore, genetic distances (2.75 DNN) among these two taxa, albeit high enough to support two distinct species, are fairly low compared to most species. Thirdly, the Prussian carp (Carassius gibelio) and the goldfish (Carassius auratus) share the same BIN with C. langsdorfii. All three species belong to the C. auratus species complex and have long been considered different sub-species of C. auratus, but molecular genetic analyses indicated their distinctness, despite shallow divergence (e.g., [78,79]), a pattern that we also find in our data (see e.g. NJ tree in Fig 2).

Cases of deep intraspecific divergence
In addition to the few taxa sharing BINs, we found three cases of deep divergence, i.e. in the gudgeons of the genus Gobio, in the stone loach, Barbatula barbatula, and in the minnows of the genus Phoxinus.
Gudgeons of the genus Gobio in Austria comprise three distinct mitochondrial lineages that were also resolved as distinct BINs (BOLD:AAC5607, BOLD:ABY6890 and BOLD: ADH1249), which is in sharp contrast to [42,43] who only list one species, G. gobio, and [44], who suggest the presence of two species, G. gobio and G. obtusirostris, for the Austrian Danube system with a potential hybrid zone in the Upper Danube. A recent detailed study [47] found that the three mitochondrial lineages present in Austria correspond to G. gobio, G. obtusirostris and a third lineage that is closely related to other Gobio species from the Balkans. Patterns of genetic diversity suggest that these originally allopatric lineages/species expanded their distribution recently (probably post-glacially) to come into secondary contact and hybridize in the (Austrian) Danube system, thus forming a large hybrid zone in Austria. Even though there seems to be a cline in the relative frequency of the distinct haplogroups from the upper to the lower parts of Danube system [47,80], the distribution of these lineages/species throughout Austria (and adjacent countries) is currently unresolved, and particularly complicated. Barbatula barbatula poses another ambiguous case, where sequences from the 17 morphologically identified samples can be allocated into two separate clusters in the NJ tree, forming two BINs (BOLD:AAA1239 and BOLD:AAA1243). This result is partly in line with the three clades recovered by [10], who also found high levels of divergence (<7.02% sequence divergence), potentially indicating cryptic species. The two lineages recovered in Austrian samples (4.66% divergence) are part of the eastern as well as the southern (Danubian) lineage [10] (Fig 4). This pattern also becomes evident when looking at the pan-European dataset (Fig 4). In addition to the Central European lineages, two Eastern/Northeastern lineages were recovered. This finding is consistent with previous studies [10,81], which also found pronounced structure based on other markers, but did not include Northern European samples. Furthermore, this pattern is similar to what has been observed in gudgeons of the genus Gobio [47], with separate glacial refugia and post-glacial secondary contact and admixture. Similarly, additional nuclear genetic or genomic data would be required to comprehensively dis-entangle the complex pattern observed in the mitochondrial data.
The most complex pattern was found in the genus Phoxinus (the European minnow species complex). While [44] reported Phoxinus phoxinus and P. lumaireul for Central Europe, [34,35,46] identified four species and three additional lineages of Phoxinus in Austria. These are Phoxinus marsilii and P. lumaireul (represented by three different subclades), P. csikii and P. phoxinus (introduced). Discriminating between Phoxinus species and dis-entangling their respective distribution ranges and geographical origins is impeded by subtle morphological differences as well as small interspecific genetic variation, which cannot be detected by DNA barcoding. Species delimitation is further complicated by a long and irreproducible history of stocking and translocation as well as hybridization [35]; thus, further in-depth morphological and genetic/genomic investigations are needed.

First record of ginbuna, Carassius langsdorfii, for Austria
Two species of Carassius, the Crucian carp (C. carassius) and the Prussian carp (C. gibelio), are native to Europe. Additionally, the goldfish (C. auratus) was introduced in the 17th century as an ornamental fish and has established feral populations throughout Europe (e.g., [44,82,83]), a pattern mirrored by more recent introductions of eastern Asiatic strains of C. gibelio [84,85]. Since 2000, another non-native Carassius species, C. langsdorfii, originally distributed in Japan, has been reported from several European countries [82,86,87], most likely introduced as unintended imports together with koi carps (Cyprinus rubrofuscus) [86]. As this species has hitherto not been reported for Austria, our finding of C. langsdorfii in the Schwarzaubach in Styria is the first evidence for its occurrence in Austria. Frequent hybridization among Carassius species, and between Carassius and other cyprinid species, as well as the presence of both sexually reproducing and gynogenetic populations complicate species identification in this genus. In fact, the only species to be reliably identified based on morphology is C. carassius, whereas genetic data are indispensable for identifying the other species in the genus (e.g. [82]). Indeed, knowledge about the present distribution of C. langsdorfii in Europe is almost exclusively based on mtDNA data [87]. However, a caveat of this strategy is that Carasissus species have a high propensity to hybridize, and thus hybridization and introgression might lead to erroneous species identifications when based on mtDNA alone. Nonetheless, the discovery of a C. langsdorfii haplotype at least confirms the presence of C. langsdorfii mtDNA in Austria. Whether our specimen is indeed C. langsdorfii or a hybrid will have to be confirmed by additional, ideally nuclear genetic/genomic data. Phenotypically, this individual has a lower body (with fewer scale rows) than C. gibelio sensu stricto caught at the same site (see S1 Fig). The specimen also differed from C. gibelio sensu stricto specimens by its lighter ventral and darker dorsal side (compare with [86]), suggesting it might indeed be C. langsdorfii.

Nomenclatural issues
Uncertainties in nomenclature such as in the above-mentioned example of C. langsdorfii, but also taxonomic revisions or even 'under-studied' groups constitute an un-negligible issue for online repositories such as BOLD but also museum collections. This becomes apparent when, e.g., looking at gudgeons. Both, [42] and [43] listed Gobio kesslerii as present in Austria, whereas [44] already used Romanogobio kesslerii. According to [45], however, the correct species name should be Romanogobio carpathorossicus, and here we follow this suggestion but note that R. carpathorossicus is listed as a synonym of R. kessleri in Eschmeyer's catalogue of fishes [71]. A similar situation is found in gudgeons of the genus Gobio, where [42,43] only list G. gobio, whereas [44] report G. gobio and G. obtusirostris from the Danube system with the potential existence of a hybrid zone. The most recent work by [47] however, found three distinct lineages (likely corresponding to G. gobio, G. obtusirostris and a third, Balkans-derived lineage) to which we also adhere in this study and which was confirmed by [80]. The distribution of these lineages throughout Austria (and adjacent countries) is currently unresolved, and further complicated by high morphological variability and hybridization [47].
Systematics and taxonomy change over time simply due to the accumulation of new or more comprehensive data [45,[88][89][90][91]. Therefore, museum collections as well as digital (sequence) repositories need to be periodically updated to reflect currently accepted nomenclature. In museum collections, this translates to an iterative additive labelling of physical objects (the verbatim labels are never changed) as well as an immaculate concurrent (digital) documentation [92]. Regarding BOLD, skilled personal observing and incorporating current changes and novelties in the taxonomic backbone are crucial to uphold user confidence and integrity with regards to content. Despite the undisputable requirement of additional effort and resources, this accuracy and timeliness will ensure maximum reliability and use of reference barcode data (in the sense of voucher-related DNA sequences) as well as museum collections for future applications.
This barcode-based inventory of the Austrian fish fauna has brought some new additions [45,47,48,93] and while some of these novelties are shared with adjacent countries [e.g. [10,33], others are original to Austria [45] underscoring the need to update a national Red List. We argue that national red lists should increasingly be augmented by genetic data [10,[94][95][96], which allows for non-invasive monitoring [54] and might illuminate the need for further detailed ecological or systematic study for problematic or ambiguous taxa [31,32]. Here, we provide the first comprehensive DNA barcode reference set for Austrian fishes, which may serve as a basis for a regularly updated Austrian Red List of fish species, aid in sample/specimen identification for both basic and applied monitoring, provide the basis for sound fisheries management and conservation of native fish populations and facilitate read determination in eDNA or meta-barcoding studies. Furthermore, our data update helps to increase the coverage of barcoding data at the European scale and thus will likely be useful in a wider biogeographic context.